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ABSTRACT 

The BRENDA (BRaunschweig ENzyme DAtabase) 
enzyme portal (http://www.brenda-enzymes.org) 
is the main information system of functional bio- 
chemical and molecular enzyme data and provides 
access to seven interconnected databases. 
BRENDA contains 2.7 million manually annotated 
data on enzyme occurrence, function, kinetics and 
molecular properties. Each entry is connected to a 
reference and the source organism. Enzyme ligands 
are stored with their structures and can be 
accessed via their names, synonyms or via a struc- 
ture search. FRENDA (Full Reference ENzyme DAta) 
and AMENDA (Automatic Mining of ENzyme DAta) 
are based on text mining methods and represent a 
complete survey of PubMed abstracts with informa- 
tion on enzymes in different organisms, tissues or 
organelles. The supplemental database DRENDA 
provides more than 910000 new EC number- 
disease relations in more than 510000 references 
from automatic search and a classification of 
enzyme-disease-related information. KENDA 
(Kinetic ENzyme DAta), a new amendment extracts 
and displays kinetic values from PubMed abstracts. 
The integration of the EnzymeDetector offers an 
automatic comparison, evaluation and prediction 
of enzyme function annotations for prokaryotic 
genomes. The biochemical reaction database 
BKM-react contains non-redundant enzyme- 
catalysed and spontaneous reactions and was de- 
veloped to facilitate and accelerate the construction 
of biochemical models. 



INTRODUCTION 

BRENDA (BRaunschweig ENzyme DAtabase, http:// 
www.brenda-eiizymes.org) is tlie major information 
system for enzyme-related research. The development 
was started 25 years ago and the data were made available 
via the internet in 1998 with a first query system. Since 
then it has been continually updated and further de- 
veloped to meet the requirement in newly arising 
branches of biomedical research hke systems biology or 
metabolomic research. The database holds a wide range of 
aspects of enzymology such as functional data Hke 
enzyme-catalysed reactions, kinetic data for catalysis and 
enzyme inhibition, enzyme stabihty, purification, crystal- 
lization or mutations, as well as the largest collection of 
enzyme names and synonyms, altogether stored in ~50 
information fields. Each data entry is connected to the 
Hterature reference, to the name of the source organism, 
and to the protein sequence identifier (if available). For 
many organisms the cell type or a strain specification is 
included. Enzymes from multicellular organisms (e.g. 
mammals or plants) are further specified with respect to 
their occurrence in body parts, organs or plant anatomy. 
The terms used here are based on the BRENDA Tissue 
Ontology (BTO) (1). This reference system has been de- 
veloped in parallel to the enzyme database as an encyclo- 
pedia for tissue terms and cell types including synonyms 
and definitions and currently holds ~5300 different terms. 
The subcelluar localization of an enzyme is described 
using the terms of the Gene Ontology (GO) (2). 

The term iigand' is used in BRENDA for all com- 
pounds, which interact with enzymes. These can be small 
molecules like the metabofites in the primary metabofism, 
macromolecules, cosubstrates/cofactors or metal ions 
which must be present for the activity of the enzyme. 
Enzyme inhibitors represent a major portion of the 
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ligands. These can be of variable origin like naturally 
occurring antibiotics or synthetic chemicals analysed for 
the development of drugs or pesticides. Macromolecules 
Hke DNA, RNA, proteins or polysaccharides also interact 
with enzymes as substrates, inhibitors or as agents that 
regulate and modify the activity. The affinity of individual 
ligands for an enzyme is stored in BRENDA in different 
kinetic or association constants as Km, kcat, ^ml^c&t for 
substrates or Ki, IC50 values for inhibitors. 

For more than 130000 BRENDA hgands the chemical 
structure is stored. This allows the display of structural 
diagrams for the molecules and the chemical reactions. 
All stored ligand structures are searchable via a query 
based on a structure or substructure entered by the user. 
This is essential because of the variety of different names 
being used for compounds. On average ~20% of the 
ligands in BRENDA are stored with different names, 
some with 20 different names like the common adenosine 
5'-phosphate (AMP). Using the structural information the 
ligands in BRENDA are then linked to the ChEBI (3) 
database of chemical entities of biological interest. 

In many cases the biochemical reaction is defined using 
generic terms for molecules. Examples are (l->4)- 
a-D-glucooligosaccharide or aryl-P-D-glucoside. They 
are stored in the database using the Markush concept 
(with R as generic substituents). Since the huge amount 
of publications on enzyme properties does not allow the 
manual annotation of the complete hterature of all 
enzymes, the manually annotated data are supplemented 
with information retrieved by text mining procedures. 
Three supplementary databases therefore complete the in- 
formation. The quahty level of automatic retrieved infor- 
mation still remains behind manually annoted data. 
However text mining techniques are improving. Thus the 
deployed text mining procedures for the automatic 
generated data bases are continually adapted to new 
findings. Additionally entries of these data bases are 
accompanied by reliabihty categories and quality scores. 
These enables the user to judge intuitively on the validity 
of the provided information. The procedures are based on 
the text interpretation of sentences containing enzyme 
names, organism names, localization, and source tissues 
in abstracts and titles of the PubMed database (4). 
Although the use of the common name is strongly recom- 
mended by the lUBMB (International Union of 
Biochemistry and Molecular Biology) biochemical nomen- 
clature committee, by now we found that more than 
65 000 different names are in use for the enzymes of 
~4800 approved EC classes (EC numbers issued by the 
lUBMB) (5,6). These names were collected via the manual 
annotation of ~ 120 000 hterature references and are 
stored as synonyms. This insures that an enzyme is even 
found in the database if a rarely used name is entered as a 
query. For some EC classes such as the protein tyrosine 
kinase (EC 2.7.10.1) more than 1300 names are found in 
the literature. On average each EC class has 14 synonyms. 
For the application in the text mining process this list is 
curated in order to remove any non-specific enzyme 
names. Other controlled vocabularies used in BRENDA 
are the BTO for tissue distribution, the GO for subcellular 
localization and the NCBI Taxonomy tree (7). 



FRENDA (Full Reference ENzyme DAta) aims at 
providing an exhaustive collection of indexed hterature 
references containing organism-specific enzyme informa- 
tion by providing all combinations of enzyme names and 
organisms found in PubMed titles or abstracts together 
with the literature citation (8). 

AMENDA (Automatic Mining of ENzyme DAta) is a 
subset of FRENDA and comprises enzyme-specific infor- 
mation on the enzyme source based on the vocabulary of 
the BTO and the subcellular localization based on the GO 
terms. The results are classified into four rehabihty 
categories depending on the occurrence of search terms 
in title and/or abstract and/or MeSH terms. 

KENDA (Kinetic ENzyme DAta) was developed to 
include additional functional kinetic enzyme data. This 
text mining approach extracts kinetic values and expres- 
sions from more than 2.2 million PubMed abstracts based 
on the results of the FRENDA database. 

DRENDA provides broad information on the connec- 
tion of diseases and enzymes. It is based on the analysis of 
disease-related enzyme information using a subset of 
MeSH terms. The results are classified into the categories 
causal interaction, therapeutic application, diagnostic usage, 
and ongoing research and presented together with the re- 
spective quality scores (9). 

For a complete picture on the enzyme properties 
BRENDA includes data from many external sources 
such as genome sequences [FBI Genomes Server (10), 
Ensembl database (11)], protein sequences [UniProt data- 
bases (12,13)], functional assignments [COG database 
(14)], molecular pathways [KEGG database (15)] and 
protein 3D-structures from the PDB (16). Links are 
provided for individual enzymes to metabolic databases 
such as KEGG or MetaCyc (17) and to the enzyme no- 
menclature of the lUBMB. Computer programs are used 
for the prediction of membrane-associated enzymes, for 
the display of active centers or the cofactor binding 
sites, or for the display of the enzymes in the genomic 
context (Genome Explorer). 

In this article, we give an overview on the enzyme data in 
BRENDA, AMENDA, FRENDA, and the newly de- 
veloped databases KENDA and DRENDA. Likewise 
available from the website are the newly developed 
integrated biochemical reaction database BKM-react (18) 
and the Enzyme Detector (19) which are described in detail. 

BRENDA DATA 

Enzyme classification 

The database covers all enzyme classes that have been 
classified by the lUBMB plus more than 300 others that 
are not yet characterized well enough to be fully classified. 
Currently there are 4867 active EC classes plus 871 EC 
classes which have been deleted or transferred to other 
EC classes due to new research results. The number of 
EC classes is rising constantly. 

In the course of the manual hterature annotation 
process for BRENDA we frequently find enzymes that 
are not yet classified. These are enzymes that differ sub- 
stantially in substrate specificity and reaction from all 
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current EC classes. They are enzymes closing a gap in a 
metabolic pathway or enzymes of a new pathway detected 
in a specitic organism. These enzymes have to be classified 
within the hierarchical EC-system and therefore are 
prepared within the BRENDA team for the final decision 
of the lUBMB Enzyme Commission with two active 
members of the BRENDA team. Even before classification 
the newly detected enzymes are integrated into the 
BRENDA database as 'preliminary BRENDA-supplied 
EC number'. Instead of the common EC number these 
entries carry a B and a serial number in the fourth 
position of the EC number. Currently (July 2012) 
BRENDA holds 321 preliminary EC numbers. A portion 
of these numbers is under review at the lUBMB. AH pre- 
liminary BRENDA EC numbers are internally reviewed 
regularly and where available new data are added. 
Table 1 shows the preliminary BRENDA-supplied entries 
of aspartic proteinases. None of these are currently under 
review by the lUBMB because there is an ongoing discus- 
sion on how to classify proteinases and how to describe 
their specificity. 



BRENDA content 

The BRENDA information system covers a wide range of 
enzyme data for each EC class. In this respect it is differ- 
ent from other enzyme databases which are specialized 
either on the nomenclature [ExplorEnz, ENZYME (20), 
IntEnz (21)], on the metabolic pathways (e.g. KEGG, 
MetaCyc) or on specific classes of enzymes [e.g. Merops 
(22), MODOMICS (23), CAZy (24), Kinonier (25)]. 

The data are either manually extracted from the 
primary literature, or obtained from data integration of 
data from other sources or obtained by text mining (and 
disclosed as such). Each single entry is connected to a 
reference, covering the hterature between 1939 and 2012. 
It is also connected to an organism, and where avail- 
able to a strain and a sequence identifier for the 
enzyme-protein. This is stored in a database with more 
than 200 milhon data entries, 2.7 milhon of which are 
hand-annotated and stored in ~50 categories. These are 
grouped into 'Nomenclature', 'Reaction and Specificity', 
'Isolation and Preparation', 'Functional Parameters', 



Table 1. Preliminary BRENDA-supplied entries of aspartic proteinases 



EC-class 



Name 



Reaction and specificity 



3.4.23. B6 
3.4.23. BIO 



3.4.23. Bll 
3.4.23. B13 



Mason-Pfizer monkey 
virus proteinase 

Rous sarcoma virus 
retropepsin 

Spumapepsin 
Proteinase P15 



3.4.23. B14 Plasmepsin IV 



3.4.23. B2 Simian 

immunodeficiency 
virus proteinase 

3.4.23.B3 Equine infectious 
anemia virus 
proteinase 

3.4.23. B4 Feline 

immunodeficiency 
virus protease 



3.4.23.B5 Murine leukemia virus 
protease 

3.4.23.B8 Human T-cell leukemia 
virus type 1 protease 

3.4.23.B9 Bovine leukemia virus 
protease 

3.4.23. Bl Napsin 

3.4.23. B17 Walleye dermal 

sarcoma virus 

proteinase 
3.4.23.B18 Mouse mammary 

tumor virus 

retropepsin 
3.4.23. B19 Plasmepsin V 

3.4.23. B20 HycD peptidase 



The enzyme cleaves 17 amino acids of the C-terminal 38-amino-acid cytoplasmic tail of the trans- 
membrane protein TM of the released immature virus. 

The cleavage sequence in the natural substrate NC-PR is PPAVS-/-LAMTMRR. The activity can be 
improved by substitution by Trp, Tyr, Phe, Leu, Arg, Glu, His or Ala in PI, Tyr in P3', and Arg, 
Phe, Asn or His in P3. 

Good cleavage at the peptide bonds: Asn-Thr, Asn-Gln, Asn-Cys and Asn-Ala. 

Efficient cleavage of Ala-Thr-His-Glu-Val-Tyr-Phe(N02)-Val-Arg-Lys-Ala, no cleavage with Ser, Arg 
or Glu at PI, Gly or Phe at P2, and Pro at P3. Specifically liberates the five major structural 
proteins from the common gag precursor, as well as reverse transcriptase and integrase from the 
gag-pol precursor. 

Cleavage of hemoglobin. In the S3 and S2 subsites, the plasmepsin 4 orthologs all prefer hydropho- 
bic ainino acid residues, Phe or He, but reject charged residues such as Lys or Asp. In S2' and S3' 
subsites these plasmepsins tolerate both hydrophobic and hydrophilic residues. 

The enzyme may have a wide substrate specificity. Good cleavage of the peptide bonds Met-Met and 
Tyr-Pro. Cleavage is also observed at Phe-Pro, Phe-Leu, Leu-Phe, Leu-Ala, Glu-AIa and Tyr-Ala. 

Processing at the authentic HIV-1 PR recognition site and release of the mature pi 7 matrix and the 
p24 capsid protein, as a result of the cleavage of the -SQNY-/-PIVQ- cleavage site. 

The enzyme seems to have a preference for Val in PI' and Phe in PI. In contrast to the HIV-1 
protease the feline immunodeficiency virus protease does not cleave the peptide 
KSGVFVQNGLVK at the Phe-Val bond. Gin in P2' may be inhibitory. In contrast to HIV-1 
protease the feline immunodeficiency virus protease does not cleave peptide KSGNFVVNGLVK at 
the Phe-Val bond. Asn in P2 may be inhibitory. 

Processing of viral polyprotein. The retroviral protease is essential for virus replication, by processing 
of viral Gag and Gag-Pol polyproteins. 

Processing at the authentic HIV-1 PR recognition site and release of the mature pi 7 matrix and the 
p24 capsid protein, as a result of the cleavage of the -SQNY-/-PIVQ- cleavage site. 

The best substrate YDPPAILPII is bearing the natural cleavage site between the matrix and the 
capsid proteins of BLV Gag precursor, polyprotein. Good cleavage of the peptide bonds: Leu-Pro, 
Leu- Val, Gly- Val and Leu-Pro. 

proteolytic cleavage of polypeptides to large and stable peptides. 

Processing of viral polyprotein. Preference order for PI position is Phe > Tyr > Leu, Met > Ala. 
Gly is preferred at position P3. Ala and Pro are preferred at position P4. Asn, Cys or Leu are 
preferred at position P2. 

Processing of viral polyprotein. Selective for large aromatic residues (Tyr and Phe) at position PI. 

Phe and Leu are preferred at position P3. No hydrolysis of substrates with Gly or Ala at position 

P3. Medium-sized or large hydrophobic residues as He, Leu and Phe are preferred at position P4. 
Cleavage of hemoglobin. In contrast to the food vacuole plasmepsins, detergent-solubilized PMV 

does not bind the aspartic protease inhibitor pepstatin. 
This enzyme specifically removes a 1 5-amino acid peptide from the C-terminus of the precursor of 

the large subunit of hydrogenase 2 [UniProt: POACEO] in E. coli. 
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Table 2. Increase of the manually annotated data content in 
BRENDA from 2008 to 2012 





zuuo 


901 9 


Increase trom 

9008 tn 9019 

ZUUO LU \ /Q) 


EC classes 


4824 


5372 


1 1.4 


Enzyme names 








Organisms 


8930 


10 511 


17.7 


Source/tissue 


ji j4 / 


S / OJZ 


DJ. / 


Localization 


21 857 


30 894 


41.3 


Kivi-values 


89012 


121298 


36.3 


Kj- values 


19018 


33819 


77.8 


keai-values 


29149 


49 224 


68.9 


Mutant properties 


30437 


59 841 


96.6 


Ligand names 


119315 


177 850 


49.1 


Ligand structures 


48 651 


103 222 


112.2 


Substrates/products 


244 236 


324 591 


32.9 


Inhibitors 


127 146 


203 727 


60.2 


Stability information 


36 683 


44716 


21.9 


References 


84 607 


126 405 


49.4 



The numbers refer to the combination of enzyme-organism-(protein)- 
value. The numbers of the Ligand Names and Ligand Structures, 
the Organism and the References specify the unique entries in 
BRENDA. 



'Organism-related Information', 'Stability', 'Enzyme 
Structure', 'References' and 'Application & Engineering'. 
Each EC class is updated annually, selected new references 
are annotated and included. Table 2 shows the increase of 
the amount of data since 2008 in selected areas of the 
database. 

The section 'Reaction and Specificity' lists aU reactions 
which have been found in the annotated hterature. These 
reactions are not restricted to naturally occurring sub- 
strates but also comprise synthetic substrates. The latter 
give valuable information on possible applications of the 
enzyme in biotechnology, in the food industry or for agri- 
cultural purposes. Synthetic substrates are also widely 
used for the determination of substrate specificity. The 
reactions, substrates, products, inhibitors or cofactors 
can be displayed as chemical diagrams. 

Kinetic data are stored to evaluate the efficiency of an 
enzyme and provide insights into the catalytic process. 
In BRENDA several kinds of kinetic data are stored: 
K-M. kcat, Ki, IC50 values. In addition to the numerical 
value, recalculated to standard units, and the substrates 
or inhibitors these data fields contain essential informa- 
tion stored in the commentary section, as e.g. information 
on the experimental conditions, the isoenzymes or mutant 
forms etc. The comparison of kinetics for wild-type, 
mutant or enzymes produced by site-directed mutagenesis 
gives insight into the catalytic process. When a mutation 
is associated with a disease the altered kinetic behavior can 
lead to valuable conclusions and possibly to a treatment. 
In BRENDA such data can be accessed for example in the 
Km search field by typing 'mutant' in the commentary 
search box. The optimal temperature and pH are also 
given. However the kinetic data are not always recorded 
at the optimal temperature and pH, for experimental 
reasons such as the instability or insolubihty of the 
substrates. 



Organism-related data 

Each data entry in BRENDA is linked to the name of the 
source organism and, where available to a strain name and 
a protein identifier (generally UniProt accession codes). 
Currently BRENDA stores enzyme data for 10480 differ- 
ent organisms. 

Sequence and structure data 

The section 'Enzyme Structure' contains 2.37 million hnks 
to the protein sequences of the UniProt database and 
78 000 3D-structures of the Protein Data Bank. These 
data are used for the visualization of the 3D-structures 
using JMOL (26), which is integrated into the BRENDA 
website. Structural and functional features, such as active 
sites or binding sites can be displayed in the representation 
of the enzyme (27). 

Sequence data are included into the calculation of 
the transmembrane helices to provide the prediction of the 
number, the size and the location of these helices using 
TMHMM [TransMembrane Hidden Markov Model, (28)]. 

AMENDA and FRENDA 

While the manually annotated BRENDA database aims at 
selectively providing enzyme functional data it cannot 
provide a full record and annotation of the complete litera- 
ture being pubhshed in enzymology. The databases 
FRENDA and AMENDA complement the information 
with data derived from text mining. FRENDA provides 
links to all PubMed references that cover enzyme-specific 
information in combination with the name of the organism 
or one of its synonyms. AMENDA is a subset of FRENDA 
and specifies the occurrence in tissues, organs and the 
subcellular localization. Here the text mining procedure is 
more refined and the vocabularies have been intensively 
curated. The results are ranked to four rehabihty categories. 
The current FRENDA database stores ~1.9 million refer- 
ence data for enzymes. The subset AMENDA contains 
1 . 1 million ranked entries. Of these ~230 000 give informa- 
tion on the occurence in tissues hnked to the BTO and 
52 000 define the subcellular locahzation. 

NEW DEVELOPMENTS 

Kinetic ENzyme DAta 

BRENDA contains more than 285 000 manually 
annotated kinetic values. Nevertheless they cannot cover 
the entire enzyme literature. Recently this gap could be 
filled with data from a text mining method adapted from 
a previously developed method (29). The procedure is 
based on text interpretation and supported by dictionaries 
with ~2000 collected kinetic terms and units (including 
different spellings), ~2000 terms for the interpretation of 
the sentence structure (e.g. negations, marker for hstings). 
Approximately 180000 names for metabohtes, inhibitors 
and other compounds are taken from the BRENDA 
database. A total of 63 103 enzyme names and 5048 EC 
classes together with 775 684 organisms names and their 
synonyms from the FRENDA database are included in 
the search. The method is performed on 2.2milhon 
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PubMed abstracts, representing only those where enzymes 
have been found by FRENDA, and extracts kinetic values 
in 14 categories (Km, K;, kcat, ^J¥^m, V^a^, IC50, S0.5, 
Kd, Ka, ti/2, pi, nH, specific activity, Vrn^^J^M)- The pro- 
cedure involves the following steps: 

- Abstracts are split into sentences. Sentences and titles 
(handled as sentences) are stored in the results database. 

- If an organism or an enzyme was found by FRENDA 
the sentence ID is stored along with the organism and/ 
or enzyme (systematic name or EC class) in the results 
database. 

- Kinetic expressions are extracted from sentences with 
enzymes. 

- Kinetic units are extracted from sentences with kinetic 
expressions. 

- Values, ligands and other terms (binding words, 
markers for hstings etc.) are extracted from sentences 
with kinetic units. 

- Sentences with enzyme, kinetic expression, kinetic unit 
and number are further processed: 

- Nested terms are removed. 

- Values and units are merged. 

- Values and terms are removed if headed or followed 
by a marker for removal. 

- Lists are identified. 

- Remaining sentences are stored into the results 
database. 

- If no organism is found in the sentence and only one 
organism has been found in the abstract, this 
organism is handled as if found in the sentence. 

The database contains 20817 kinetic data stored as 
kinetic value connected to a kinetic expression, to the 
unit, to the EC class, to a reference and to a source 



organism if mentioned in the text. Access to the new 
data is provided via the homepage and selecting 
'Kinetic ENzyme DAta'. Km and IC50 values represent 
the large majority of the found values. Figure 1 shows 
the relative distribution of the kinetic categories in the 
results. The analysed PubMed abstracts can be displayed 
to view the context. Figure 2 shows a typical abstract 
which has been analysed with KENDA. 

Disease-Related ENzyme information DAtabase 

Due to their determinative and crucial role in all aspects of 
life, including metabohsm, regulation, immunity, etc., the 
absence or malfunction of enzymes leads to severe patho- 
logic conditions in an organism and may manifest itself in 
a disease. 

The Disease RElated ENzyme information DAtabase 
DRENDA represents a supplemental database to 



1% 




Figure 1. Relative distribution of the kinetic categories in the results of 
the KENDA data. 



Inhibitor}' effect of koji Aspergillus terreus on alpha-glucosidase activ-it\' and postprandial hj'perglycemia. 
Deni, RT; Iskandar, VM; Hana.fi, M; Kardono, LB; AngHbut, M; Dewijand, ID; Banjamahor, SD; 
Pat J Bio! Set; 10; 3131-5 (2007) 19090111 

The compounds that could inhibit the activity of a-glucosidase are potentially used for antidiabetic by suppressing 
postprandial hj^perglycemia. This research aimed to investigate the hypoglycemic activity' in A. terreus koji 
extracted by ethyl acetate. The extracts was dissolved in methanol: water (1:4), followed by fractionations with 
n-hexane, methylene chloride and ethyl acetate. Each fraction was assayed for its activity' against a-glucosidase. 
The active fraction was purified by column chromatography using silica gel and resin as adsorbent. The kopi extract 
showed potentied as alpha-glucosidase inhibition with IC50 10 microg mL(-l) and showed combination of 
non-competitive and uncompetitive inhibition Mod£ against a-glucosidase. ethyl acetate fraction showed potential 
as inhibitor alpha-glucosidase with IC50 = 8.6 microg mL(-l). In animal experiment, active fraction (FlO-4) of 
ethyl acetate fraction suppressed the increase of postprandial blood glucosidase level compare to the control. Thus 
it showed potential as alpha-glucosidase inhibitor and demonstrated depressed postprandial blood glucose level and 
may have potential use in the management of tjpe 2 diabetes. 

Enzyme Organism Ligand Kinetic Expression Kinetic Unit 

Figure 2. Abstract from Pubmed (ID 19090111) for a-glucosidase showing the kinetic values highlighted by the KENDA procedure. 
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BRENDA including classified literature links and an 
analysis of enzyme/disease relations. In a first step 
enzyme-related information on diseases in abstracts of 
the PubMed database was retrieved by automatic text 
analysis supported by vocabularies from BRENDA (EC 
numbers, enzyme names, ~ 100 000 items) and MeSH 
terms from the NCBI for diseases and metabolic disorders 
(~22 000 terms). This step resulted in 0.9 million inci- 
dences of enzymes and diseases in the literature. Table 3 
summarizes results for the most frequently described 
enzyme-disease relationships and the EC classes which 
have been found to be connected with the highest 
number of different diseases. 

However, a simple Hsting of references with enzymes 
names which are connected with a disease name is not 
of much value for the researcher. In order to assess the 
kind of connection or dependency the relationship was 
classified into four categories: 

(1) 'Causal Interaction' of a disease caused by a mal- 
function of the enzyme; 

(2) 'Ongoing Research' when the disease-enzyme rela- 
tionship is suspected but research is still under way; 



Table 3. Enzyme-disease-related data in the DRENDA database 



Disease 


PubMed IDs 


EC no. 


Neoplasm 


110485 


1272 


Infection 


25 937 


1067 


Carcinoma 


22141 


829 


Breast neoplasm 


18053 


667 


EC class 


PubMed IDs 


Disease terms 


2.7.10.1 


28 683 


1178 


1.7.2.1 


13 946 


1058 


1.14.99.1 


10 902 


897 


3.4.15.1 


11 650 


689 


3.4.21.68 


11 939 


864 



(3) 'Diagnostic Usage' when the enzyme is part of the 
diagnostic course of action hke the measurement of 
its activity, the test for its presence or the assay of its 
functional characteristic parameters; and 

(4) 'Therapeutic Application' when the enzyme is applied 
as a therapeutic agent or considered as drug target. 

The results of the procedure were evaluated with respect 
to precision, recall, accuracy and specificity in a 5-fold 
cross- validation process. This ensures high-quality data 
and represents a true upgrade of the BRENDA data. 
Table 4 displays an overview of the data content and the 
results of the evaluation procedure. 

The results of the procedure were evaluated with respect 
to precision, recall, Fl score, specificity and accuracy in 
a 5-fold cross-validation process as shown in the equa- 
tions (l)-(5). This ensures high-quahty data and repre- 
sents a true upgrade of the BRENDA data. Table 4 
displays an overview of the data content and the results 
of the evaluation procedure. 



Precision 



true positives 



Recall : 



Fl Score = 2 x 



Specificity 



true positives + false positives 

true positives 
true positives + false negatives 

Precision x Recall 



Precision + Recall 
true negatives 



Accuracy : 



false positives + true negatives 



true positives + true negatives 
true positives + true negatives + 
false positives + false negatives 



(1) 
(2) 
(3) 
(4) 

(5) 



Table 4. Results of the DRENDA validation process for the classification of the enzyme-disease relationships 



Category 


Precision 


Recall 


Fl Score 


Accuracy 


Specificity 


Entries 


therapeutic application 4 


0.972 


0.530 


0.686 


0.750 


0.984 


114477 


therapeutic application 3 


0.909 


0.606 


0.727 


0.766 


0.936 


173 134 


therapeutic application 2 


0.900 


0.818 


0.857 


0.859 


0.903 


330 335 


therapeutic application 1 


0.868 


0.894 


0.881 


0.875 


0.855 


415923 


ongoing research 4 


0.800 


0.229 


0.356 


0.571 


0.939 


139238 


ongoing research 3 


0.765 


0.371 


0.500 


0.616 


0.878 


252136 


ongoing research 2 


0.750 


0.543 


0.630 


0.670 


0.806 


341 592 


ongoing research 1 


0.720 


0.686 


0.702 


0.700 


0.714 


440043 


diagnostic usage 4 


0.892 


0.388 


0.541 


0.680 


0.956 


175150 


diagnostic usage 3 


0.848 


0.588 


0.694 


0.749 


0.900 


279 794 


diagnostic usage 2 


0.784 


0.682 


0.730 


0.754 


0.822 


374 820 


diagnostic usage 1 


0.674 


0.753 


0.711 


0.703 


0.656 


491214 


causal interaction 4 


0.923 


0.249 


0.392 


0.508 


0.964 


259 620 


causal interaction 3 


0.897 


0.324 


0.476 


0.545 


0.934 


335 838 


causal interaction 2 


0.868 


0.490 


0.626 


0.627 


0.869 


456 842 


causal interaction 1 


0.848 


0.627 


0.721 


0.691 


0.803 


555 127 
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The integrated biochemical reaction database BKM-react 

Information on organism-specific metabolic networks and 
pathways are essential for many applications including 
drug target identification, metabolic engineering, etc. 
Within systems biology genome-wide metabolic models 
are developed that allow a prediction of cellular reactions 
to change in the environment or genetic modifications. An 
essential prerequisite for these applications is a complete 
set of biochemical reactions for the constructions of the 
metabolic pathways. BRENDA contains a set of ~62 000 
unique fully characterized enzyme-catalysed reactions 
(plus ca. 27 000 reactions where, e.g., one of the 
products was not exphcitly determined in the paper). 
About 10000 of these are specified in the Hterature as re- 
actions occurring in the cell. About 8400 reactions are 
stored in the KEGG Reaction database and MetaCyc 
stores ~ 10 000 reactions. 

The contents of these databases are overlapping to a 
considerable degree. Due to the fact that — as in 
enzymes — a large number of different names are in use 
for the substrates and products the identity of two reac- 
tions is far from obvious. Matching the compounds via 
their structures has been more promising. The procedure 
outlined in Figure 3 is based on a first comparison of the 
InChls (30) generated from molfiles plus a match with 
respect to the names and is described in detail in (17). 
The procedure led to a set of 2890 reactions (15%) 
common to all databases. 10% occur in two databases 
and the rest are unique to one of the databases. 

The often implicitly defined stereochemistry proves to 
be the major problem. If one of the stereoisomers is much 
more frequently found than the other the stereodescriptors 
are often omitted (e.g. the L in L-methionine or the D in 
D-glucose). The databases follow different strategies to 
cope with that problem. Assigning the a- or P-anomeric 
form in carbohydrates is another difficulty. 

BKM-react can be accessed via the BRENDA website 
or with an independent URL. The query system allows 
searches via EC numbers, pathways, substrate or 



BRENDA 



JL 



#MetaCyc 



Extraction of all enzyme-catalyzed 
and spontaneous reactions 



IVIatcliIng of reaction participants using InCfils 



Comparison of compound identifiers 







BKM-react online 

Figure 3. Workflow for matching reactions and compounds in 
BKM-react. 



product names, or identifiers. Besides showing the react- 
ants the result page also fists the stoichiometry and 
unbalanced reactions. 

The enzyme detector 

Several databases provide functional annotations of 
genomes either based on different computational methods 
or on hand-curated annotations. An initial comparison of 
the main annotation hosts for nine different prokaryotic 
organisms revealed 70% inconsistencies. Therefore, we im- 
plemented the annotation pipeline EnzymeDetector. This 
tool automatically compares and evaluates the assigned 
enzyme functions from the main annotation databases 
NCBI, KEGG, PEDANT (31,32), Pseudomonas Genome 
Database V2 (33) and SwissProt. The obtained data are 
supplemented with our own developed function prediction. 
This is based on a sequence similarity analysis, on manually 
created organism-specific enzyme information from 
BRENDA, and on sequence pattern searches (34). With 
these integrated data the user can estimate the reliability 
of the found annotation predictions. A customizable 
scoring scheme was developed by comparing the annota- 
tions found in the different sources to the manually curated 
data and to the annotations found in SwissProt. 

All data found in the several integrated sources are stored 
in a database and can be accessed on the web interface of 
the program. Results can be viewed in different ways: 

• The tabular view shows the results sorted by gene iden- 
tifier. They can be sorted by EC number or accepted 
enzyme name. The search results can be downloaded 
as a CSV file. 

• The statistics view shows a statistical evaluation of the 
results. It is possible to chose between a static view 
applying the default settings or a dynamic view 
which appfies the user-chosen constraints. 

• The annotation comparison view allows the comparison 
of the enzymes of the selected organisms to the 
enzymes of one or two other organisms. 

• The pathway view shows the total number of enzymes, 
the found and the missing enzymes in the pathways of 
KEGG and MetaCyc. 

Figure 4 shows a statistical view of the search results for 
Escherichia coli strain: K-12 DHIOB. The database is 
updated bi-annually. 



THE BRENDA PORTAL 

Searches for enzymes and enzyme data can be performed 
in multiple ways starting from the website. 

(1) The Quick Search provides direct searches for all of 
the ~50 information fields as well as a full-text 
search. The latter includes a search in all commentary 
sections of the database which are not separately fisted 
on the website. 

(2) The Advanced Search offers the possibility to combine 
many search criteria and search in a target-oriented 
manner thus avoiding overlong lists of results. 
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1 Results 


Statistics 


1 1 






1 chosen organism: Escherichia coli s 
1 Overall number of genes: 4129 


train: K 12 DH10B NC-Number: NC_01I)473 







Statistics for all database entries with detault 
contraints : 




function: 


1S98 


-with 1 EC-candidate 


1610 


- with 2 EC-candidates 


87 


-with 3 EC-candidates 


1 


- with relevance group 1 : 7 - 25 


775 


-with relevance group 2'. 26 - 42 


523 


■ with relevance group 3: 43 ■ 86 


400 


Number of distincf ec-numoers: 


1283 


-with 1 gene positions 


652 


- with 2-4 gene positions 


495 


-with 5-10 gene positions 


110 


-with >10 gene positions 


26 


Number of distinct ec-numbers not sequence- 
based: 


2636 


Entries: 




-inAMENDA: 


0 


- in BRENDA 


0 


■ in BREPS: 


1163 



I Calculation of a relevance- and E-vahie dependent statistical comparison 



I databases you want to compare (at most 3): 

SBLAST-based annotation AMENDA 3 BRENDA □ 



□ minimal overaii-relevance the considered entries should have: 
7 



.J maximal e-vaiue the considered entries shouid have: 
1e- 25 



CALCULATE 
STATISTICS 



Of Up to 3 databases: 

KEGG DnCBI □ pedant iV SwissPfot 



Entries by database occurence [%]: 




BLj^ST-t)ased annotation 


41.77 


BRENDA 


0.0 


Swiss Prot 


0-91 


BU^ST-t}ased annotation/BRENDA 


0.0 


BU^ST-based annotation/SwissProt 


11.71 


BRENDA/SwissProt 


0.0 


BLAST-based annotation^RENDA/SwissProt 


0.0 


None ofthe databases present 


45,6 



Figure 4. Statistical view of the search results for E. coli K-12 DHIOB in EnzymeDetector. 



(3) The Protein-specific Search opens a window for 
searching with UniProt sequence IDs or with 
protein sequences. 

(4) The Genome Explorer displays enzymes on genomes 
and can also be used to compare enzyme localiza- 
tions in different organisms. 

(5) The Substructure Search provides a tool for drawing 
molecules or fragments thereof. This gives access to 
the BRENDA 'Ligands' which comprise a large 
quantity of molecules that interact with enzymes 
such as substrates, products, inhibitors, etc. Ligands 
can also be accessed using their names and searching 
with Quick Search Ligand. 

(6) The Taxonomic Tree shows all organisms stored in 
the respective NCBI database and provides linlcs to 
the BRENDA enzyme entries. Organisms as enzyme 
sources can also be searched using Quick Search 
Organism. 

(7) The EC Explorer gives access to the hierarchical 
assembly of EC classes in a clearly arranged display. 
Enzyme data can be accessed from here as well. 

(8) The Ontology Explorer is the entry portal for a variety 
of anatomic, chemical or other ontologies. Enzymes 
connected with any of the terms can be displayed. 



at the website (http://www.brenda-enzymes.org/brenda_ 
download/index. php). 

License Information can be viewed at http://www 
.brenda-enzymes.org/index.php4?page = information/ 
copy.php4. 

Computer-based access to BRENDA is possible via 
SOAP (http://www.brenda-enzymes.org/soap2) 

Enzyme kinetic data can be obtained via an SBML 
output. 
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